A Text Classifier Based on Sentence Category VSM
نویسندگان
چکیده
VSM is a mature model of text representation for categorization. Words are commonly used as dimensions of feature space of VSM, but words only provide little semantic information. Sentence category theory is an important component of HNC theory and can provide abundant information about meaning, structure and style of a sentence. We use sentence categories as dimensions of feature space, reduce the dimensionality by dividing mixed sentence categories and reform the weights by tfc-weighting algorithm. By simple vector distance calculation, we can get the parameters of the classifier and execute the categorization. The average precision and recall of our classifier are acceptable and can be improved by other HNC techniques.
منابع مشابه
Improvement of Chemical Named Entity Recognition through Sentence-based Random Under-sampling and Classifier Combination
Chemical Named Entity Recognition (NER) is the basic step for consequent information extraction tasks such as named entity resolution, drug-drug interaction discovery, extraction of the names of the molecules and their properties. Improvement in the performance of such systems may affects the quality of the subsequent tasks. Chemical text from which data for named entity recognition is extracte...
متن کاملApproach for Text Classification Based on the Similarity Measurement between Normal Cloud Models
The similarity between objects is the core research area of data mining. In order to reduce the interference of the uncertainty of nature language, a similarity measurement between normal cloud models is adopted to text classification research. On this basis, a novel text classifier based on cloud concept jumping up (CCJU-TC) is proposed. It can efficiently accomplish conversion between qualita...
متن کاملA Customizable Text Classifier for Text Mining
Text mining deals with complex and unstructured texts. Usually a particular collection of texts that is specified to one or more domains is necessary. We have developed a customizable text classifier for users to mine the collection automatically. It derives from the sentence category of the HNC theory and corresponding techniques. It can start with a few texts, and it can adjust automatically ...
متن کاملA Novel Approach for Ontology-Based Feature Vector Generation for Web Text Document Classification
Thetaskofextractingtheusedfeaturevectorinminingtasks(classification,clustering...etc.)is consideredthemostimportanttaskforenhancingthetextprocessingcapabilities.Thispaperproposes anovelapproachtobeusedinbuildingthefeaturevectorusedinwebtextdocumentclassification process;addingsemanticsinthegeneratedfeaturevector.Thisapproachisbasedonuti...
متن کاملA Vector Space Model for Subjectivity Classification in Urdu aided by Co-Training
The goal of this work is to produce a classifier that can distinguish subjective sentences from objective sentences for the Urdu language. The amount of labeled data required for training automatic classifiers can be highly imbalanced especially in the multilingual paradigm as generating annotations is an expensive task. In this work, we propose a cotraining approach for subjectivity analysis i...
متن کامل